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Abstract: 

We study the thermodynamic behavior of a model protein with 54 amino acids that 
forms a three-helix bundle in its native state. The model contains three types of 
amino acids and five to six atoms per amino acid and has the Ramachandran torsional 
angles (pi, ipi as its degrees of freedom. The force field is based on hydrogen bonds 
and effective hydrophobicity forces. For a suitable choice of the relative strength of 
these interactions, we find that the three-helix-bundle protein undergoes an abrupt 
folding transition from an expanded state to the native state. Also shown is that 
the corresponding one- and two-helix segments are less stable than the three-helix 
sequence. 



*E-mail: irback, fredriks, stefan@thep.lu.se 



1 



1 Introduction 



It is not yet possible to simulate the formation of proteins' native structures on the 
computer in a controlled way. This goal has been achieved in the context of simple 
lattice and off-lattice models, where typically each amino acid is represented by a 
single interaction site corresponding to the C a atom, and such studies have provided 
valuable insights into the physical principles of protein folding Jl|-|5[ and the statistical 
properties of functional protein sequences |§, [/]]. However, these models have their 
obvious limitations. Therefore, the search for computationally feasible models with 
a more realistic chain geometry remains a highly relevant task. 



In this paper, we discuss a model based on the well-known fact that the main degrees 
of freedom of the protein backbone are the Ramachandran torsional angles 4>i,ipi |§. 
Each amino acid is represented by five or six atoms, which makes this model com- 
putationally slightly more demanding than C a models. On the other hand, it also 
makes interactions such as hydrogen bonds easier to define. The formation of na- 
tive structure is, in this model, driven by hydrogen-bond formation and effective 
hydrophobicity forces; hydrophobicity is widely held as the most important stability 
factor in proteins || [HJ , and hydrogen bonds are essential to properly model the 
formation of secondary structure. 



In this model, we study in particular a three-helix-bundle protein with 54 amino 
acids, which represents a truncated and simplified version of the four-helix-bundle 
protein de novo designed by Regan and DeGrado |lTfl . This example was chosen 



partly because there have been earlier studies of similar-sized helical proteins using 
models at comparable levels of resolution ]T^-^] . The behavior of small fast-folding 
proteins is a current topic in both theoretical and experimental research, and a three- 
helix-bundle protein that has been extensively studied both experimentally [[HS|,|2"(] 
and theoretically fll||17|,|2~l|,|2"2"l is fragment B of staphylococcal protein A. 



In addition to the three-helix protein, to study size dependence, we also look at the 
behavior of the corresponding one- and two-helix segments. By using the method 
of simulated tempering |[23|-p5*|, a careful study of the thermodynamic properties of 
these different chains is performed. 



Not unexpectedly, it turns out that the behavior of the model depends strongly 
on the relative strength of the hydrogen-bond and hydrophobicity terms. In fact, 
the situation is somewhat reminiscent of what has been found for homopolymers 
with stiffness [p6H29|, with hydrogen bonds playing the role of the stiffness term. 
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Figure 1: Schematic figure showing the representation of one amino acid. 



Throughout this paper, we focus on one specific empirical choice of these parameters. 

For this choice of parameters, we find that the three-helix-bundle protein has the 
following three properties. First, it does form a stable three- helix bundle (except 
for a 2- fold topological degeneracy). Second, its folding transition is abrupt, from 
an expanded state to the native three-helix-bundle state. Third, compared to the 
one- and two-helix segments, it forms a more stable secondary structure. It should 
be stressed that these properties are found without resorting to the popular Go 
approximation | S0| , in which interactions that do not favor the desired structure are 
ignored. 



2 The Model 



The model we study is a reduced off-lattice model. The chain representation is 
illustrated in Fig. As mentioned in the introduction, each amino acid is represented 
by five or six atoms. The three backbone atoms N, C a and C are all included. Also 
included are the H and O atoms shown in Fig. [I], which we use to define hydrogen 
bonds. Finally, the side chain is represented by a single atom, Cp, which can be 
hydrophobic, polar or absent. This gives us the following three types of amino acids: 
A with hydrophobic Cp, B with polar C^, and G (glycine) without Cp. 

The H, O and Cp atoms are all attached to the backbone in a rigid way. Furthermore, 
in the backbone, all bond lengths, bond angles and peptide torsional angles (180°) 
are held fixed. This leaves us with two degrees of freedom per amino acid, the 
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Bond lengths (A) Bond angles (°) 





1.46 


C'NC a 


121.7 


c a c 


1.52 


NC a C 


111.0 


CN 


1.33 


C a CN 


116.6 


NH 


1.03 




110.0 


C Q C<3 


1.53 


n/n /"i 


110.0 


CO 


1.23 







Table 1: Geometry parameters. 



Ramachandran torsional angles fa and ipi (see Fig. [l]). The parameters held fixed can 
be found in Table |I| 

Our energy function 

E = E\ oc + E sa + E hh + Eaa (1) 

is composed of four terms. The local potential E\ oc has a standard form with 3-fold 
symmetry, 

£ioc = I £(! + cos 30,) + y E(! + cos3 ^) • (2) 
The self-avoidance term i£ sa is given by a hard-sphere potential of the form 

^a^saE'f-) 12 ' (3) 

where the sum runs over all possible atom pairs except those consisting of two hy- 
drophobic Cp. The hydrogen-bond term E^ is given by 

E hh = Chb u ( r ij) v ( a ij> Pij) > ( 4 ) 



„M = 5f^V 2 -6^V° (5) 



where 



/ a \ \ COS 2 OLij COS 2 f3ij Oiij, Pij > 90° , * 

= \0 otherwise (6) 

In Eq. H i and j represent H and O atoms, respectively, and denotes the HO 
distance, the NHO angle, and /3y the HOC angle. Any HO pair can form a 
hydrogen bond. The last term in Eq. [I], the hydrophobicity term E^A: has the form 



Eaa — caa E 

i<j 



V TV,- / V fj, / 



(7) 
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o i (A) 

£<$> H £ sa 6 hb €AA N C a C H Cp O CT hh (A) CT AA (A) 

1 1 0.0034 2.8 2.2 1.65 1.85 1.85 1.0 2.5 1.65 2.0 5.0 

Table 2: Parameters of the energy function. Energies are in dimensionless units, in 
which the folding transition occurs at kT rs 0.65 for the three-helix-bundle protein 
(see below). 



where both i and j represent hydrophobic Cp. To speed up the simulations, a cutoff 
radius r c is used|] which is 4.5A for E S3i and E hh , and 8A for £aa- 

In this energy function, roughly speaking, the first two terms, E\ oc and E S3j , enforce 
steric constraints, whereas the last two terms, E h b and -Baa, are the ones responsible 
for stability. Force fields similar in spirit, emphasizing hydrogen bonding and hy- 
drophobicity, have been used with some success to predict structures of peptides f3"l|| 



and small helical proteins [15 . 



The parameters of our energy function were determined largely by trial and error. 
The final parameters are listed in Table The parameters of Eq. |3| are given by 

Cy = CTj + CTj + A(7ij , 

where <Ji,<ij can be found in Table 0, and A<7jj is zero except for C^C, CgN and 
C^O pairs that are connected by three covalent bonds. In these three cases, we put 
Aery = 0.625A. This could equivalently be described as a change of the local fa and 
ipi potentials. In Fig. |^, we show fa, fa scatter plots for nonglycine (A and B) and 
glycine for our final parameters, which are in good qualitative agreement with the 



fa, fa distributions of real proteins |8L 32 



Finally, we determined the strengths of the hydrogen-bond and hydrophobicity terms 
on the basis of the resulting overall thermodynamic behavior of the three-helix se- 
quence. For this purpose, we performed a set of trial runs for fixed values of the other 
parameters. An alternative would have been to use the method of Shea et al. | 33fl . The 



result of our empirical determination of ehb and caa does not seem unreasonable; at 
the folding temperature of the three-helix sequence (see below), we get e^/kT w 4.3 
and e AA /kT w 3.4. 

In this model, we study the three sequences shown in Table E| which contain 16, 35 



^The cutoff procedure is f(r) i— > f(r) where f(r) = f(r) — f(r c ) — (r — r c )f'(r c ) if r < r c and 
f(r) = otherwise. 
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NON-GIY 



GLY 




Figure 2: 4>i,ijji scatter plots for nonglycine and glycine, as obtained by simulations 
of the chains GXG for X=A/B and X=G, respectively, at kT = 0.625 (shown is <f>i, ipi 
for X). 

1H: BBABBAABBABBAABB 

2H: 1H-GGG-1H 

3H: 1H-GGG-1H -GGG 1H 



Table 3: The sequences studied. 



and 54 amino acids, respectively. Following the strategy of Regan and DeGrado [5 I 



the A and B amino acids are distributed along the sequence 1H in such a way that 
this segment can form a helix with all hydrophobic amino acids on the same side. The 
sequence 3H, consisting of three such stretches of As and Bs plus two GGG segments, 
is meant to form a three-helix bundle. This particular sequence was recently studied 



by Takada et al. [18], who used a more elaborate model with nonadditive forces. 



3 Results 



To study the thermodynamic behavior of the chains described in the previous section, 
we use the method of simulated tempering. This means that we first select a set 
of allowed temperatures and then perform simulations in which the temperature is 
a dynamical variable. This is done to speed up low-temperature simulations. In 
addition, it provides a convenient method for calculating free energies. 
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Figure 3: Monte Carlo evolution of the energy and radius of gyration in a typ- 
ical simulation of the three-helix sequence. The bottom panel shows how the 
system jumps between the allowed temperatures Tj, which are given by Tj = 
T ata (T msK /T a i a )V- 1 VV- 1 ') II with kT min = 0.625, kT m&x = 0.9 and J = 8. The 
temperature T min is chosen to lie just below the collapse transition, whereas T max is 
well into the coil phase (see Fig. ||). 



An example of a simulated-tempering run is given in Fig. [J[ which shows the Monte 
Carlo evolution of the energy E and radius of gyration R g (calculated over all back- 
bone atoms) in a simulation of the three-helix sequence. Also shown, bottom panel, 
is how the system jumps between the different temperatures. Two distinct types of 
behavior can be seen. In one case, E is high, fluctuations in size are large, and the 
temperatures visited are high. In the other case, E is low, the size is small and almost 
frozen, and the temperatures visited are low. Interesting to note is that there is one 
temperature, the next-lowest one, which is visited in both cases. Apparently, both 
types of behavior are possible at this temperature. 

In Fig. |]a we show the specific heat as a function of temperature for the one-, two- 
and three-helix sequences. A pronounced peak can be seen that gets stronger with 
increasing chain length. In fact, the increase in height is not inconsistent with a linear 
dependence on chain length, which is what one would have expected if it had been a 
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Figure 4: Thermodynamic functions against temperature for the sequences 1H (o), 
2H (x) and 3H (+) in Table g. (a) Specific heat C v = ((E 2 ) - (E) 2 )/NkT 2 , N being 
the number of amino acids, (b) Hydrogen-bond energy per amino acid, E^/N. (c) 
Chain entropy per amino acid, 5S/N = [S — S(kT = 0.9)]/N. The full lines in (a) 
represent single-histogram extrapolations f55|| . Dotted lines are drawn to guide the 
eye. 



conventional first-order phase transition with a latent heat. 

Our results for the radius of gyration (not displayed) show that the specific heat 
maximum can be viewed as the collapse temperature. The specific heat maximum is 
also where hydrogen-bond formation occurs, as can be seen from Fig. |]b. Important 
to note in this figure is that the decrease in hydrogen-bond energy per amino acid with 
decreasing temperature is most rapid for the three- helix sequence, which implies that, 
compared to the shorter ones, this sequence forms more stable secondary structure. 
The results for the chain entropy shown in Fig. provide further support for this; the 
entropy loss per amino acid with decreasing temperature is largest for the three-helix 
sequence. 

It should be stressed that the character of the collapse transition depends strongly 
on the relative strength of the hydrogen-bond and hydrophobicity terms. Figure f| 
shows that the transition is very abrupt or "first-order- like" for our choice (e^b, ^aa) = 
(2.8, 2.2). A fairly small decrease of ehb/^AA is sufficient to get a very different behav- 
ior with, for example, a much weaker peak in the specific heat. In this case, the chain 
collapses to a molten globule without specific structure rather than to a three-helix 
bundle. A substantially weakened transition was observed for ehb = ^aa = 2.5. If, on 
the other hand, ehb/^AA is too large, then it is evident that the chain will form one 
long helix instead of a helical bundle. 
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Figure 5: Representative low-temperature structures, FU and BU, respectively. 
Drawn with RasMol JHJ. 

We now turn to the three-dimensional structure of the three-helix sequence in the 
collapsed phase. It turns out that it does form a three-helix bundle. This bundle can 
have two distinct topologies: if we let the first two helices form a U, then the third 
helix can be either in front of or behind that U. The model is, not unexpectedly, un- 
able to discriminate between these two possibilities. To characterize low-temperature 
conformations, we therefore determined two representative structures, one for each 
topology, which, following ||18|| , are referred to as FU and BU, respectively. These 
structures are shown in Fig. |5|. They were generated by quenching a large number of 
low-T structures to zero temperature, and we feel convinced that they provide good 
approximations of the energy minima for the respective topologies. Given an arbi- 
trary conformation, we then measure the root-mean-square distances 5i (i = FU, BU) 
to these two structures (calculated over all backbone atoms). These distances are 
converted into similarity parameters Qi by using 

Qi = exp(-^ 2 /100A 2 ) . (8) 

At temperatures above the specific heat maximum, both Qi tend to be small. At 
temperatures below this point, the system is found to spend most of its time close 
to one or the other of the representative structures; either Qfu or Qbu is close to 1. 
Finally, at the peak, all three of these regions in the Qfv,Qbv plane are populated, 
as can be seen from Fig. ^|a. In particular, this implies that the folding transition 
coincides with the specific heat maximum. 

The folding transition can be described in terms of a single "order parameter" by 
taking Q = max(Qpu,QBu) as a measure of nativeness. Correspondingly, we put 
5 = min(<5 FU , <5bu)- I n Fig- §]b> we show the free-energy profile F{Q) at the folding 
temperature. The free energy has a relatively sharp minimum at Q ~ 0.9, correspond- 
ing to 5 ~ 3A. This is followed by a weak barrier around Q = 0.7, corresponding to 
5 ~ 6A. Finally, there is a broad minimum at small Q, where Q = 0.2 corresponds 
to 5 « 13A. 
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Figure 6: (a) Qfv, Qbv (see Eq. H) scatter plot at the specific heat maximum (kT = 
0.658). (b) Free energy F(Q) as a function of Q = max(QFix, Qbv) at the same 
temperature. 
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Figure 7: (a) Q, R g and (b) Q, E hh scatter plots at the folding temperature (kT 
0.658). 



What does the nonnative population at the folding temperature correspond to in 
terms of R g and -Ehb? This can be seen from the Q, R g and Q, E^o scatter plots in 
Fig. [7[ These plots show that the low-Q minimum of F(Q) corresponds to expanded 
structures with a varying but not high secondary-structure content. Although a 
detailed kinetic study is beyond the scope of this paper, we furthermore note that 
the free-energy surfaces corresponding to the distributions in Fig. [7] are relatively 
smooth. Consistent with that, we found that standard fixed-temperature Monte 
Carlo simulations were able to reach the native state, starting from random coils. 
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Let us finally mention that we also performed simulations of some random sequences 
with the same length and composition as the three-helix sequence. The random 
sequences did not form stable structures and collapsed more slowly with decreasing 
temperature than the designed three-helix sequence. 



4 Summary and Outlook 



We have studied a reduced protein model where the formation of native structure 
is driven by a competition between hydrogen bonds and effective hydrophobicity 
forces. Using this force field, we find that the three-helix-bundle protein studied has 
the following properties: 



• It does form a stable three-helix-bundle state, except for a 2-fold topological 
degeneracy. 

• It undergoes an abrupt folding transition from an expanded state to the native 
state. 

• It forms more stable secondary structure than the corresponding one- and two- 
helix segments. 



An obvious question that remains to be addressed is what is needed to lift the topolog- 
ical degeneracy. Not obvious, however, is whether this question should be addressed 
at the present level of modeling, before including full side chains. 

A first-order- like folding transition that takes the system directly from the unfolded 
state to the native one is what one expects for small fast-folding proteins. For the 
model to show this behavior, careful tuning of the relative strength of the hydrogen- 
bond and hydrophobicity terms, ehb/^AA, is required. This ehb/eAA dependence may 
at first glance seem unwanted but is not physically unreasonable; ehb can be thought 
of partly as a stiffness parameter, and chain stiffness has important implications for 
the phase structure, as shown by recent work on homopolymers f26^p9|. Note also 
that incorporation of full side chains makes the chains intrinsically stiffer, which 
might lead to a weaker ehb/eAA dependence. 



Our three- helix sequence has previously been studied by Takada et al. []n| , who used 
a more elaborate force field. It was suggested that it is essential to use context- 
dependent hydrogen bonds for the three-helix-bundle protein to make more stable 
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secondary structure than its one-helix fragments. Our model shows this behavior, 
although its hydrogen bonds are context-independent. 

Let us finally stress that we find a first-order-like folding transition without using the 
Go approximation. Evidence for first-order-like folding transitions has been found 
for proteins with similar lengths in some C a models P,|14|,p|,[33[], but these studies 
use this approximation. 
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